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Abstract 

We consider the problem of constructing a single spanning tree for the single-source buy- 
at-bulk network design problem for doubling-dimension graphs. We compute a spanning tree 
to route a set of demands (or data) along a graph to or from a designated root node. The 
demands could be aggregated at (or symmetrically distributed to) intermediate nodes where 
the fusion-cost is specified by a non-negative concave function /. We describe a novel approach 
for developing an oblivious spanning tree in the sense that it is independent of the number of 
data sources (or demands) and cost function at intermediate nodes. 

To our knowledge, this is the first paper to propose a single spanning tree solution to 
this problem (as opposed to multiple overlay trees). There has been no prior work where the 
tree is oblivious to both the fusion cost function and the set of sources (demands). We present a 
deterministic, polynomial-time algorithm for constructing a spanning tree in low doubling graphs 
that guarantees log 3 D ■ log n-approximation over the optimal cost, where D is the diameter of 
the graph and n the total number of nodes. With constant fusion-cost function our spanning 
tree gives a 0(log 3 D)-approximation for every Steiner tree to the root. 



1 Introduction 

Buy-at-bulk network design problems arise in scenarios where economies of scale applies or when 
the availability of capacity in discrete units result in concave cost function on the edges. As 



observed in Chekuri et al. 20061, a commonly seen application is in telecommunication networks 



where bandwidth on a link can be purchased in some discrete units u\ < U2 < ■ ■ ■ < u n with 
respective costs c\ < c% < . . . < c n . The economies of scale exhibits the property where the cost 
per bandwidth decreases as the number of units purchased gets larger: c\ju\ > C2/U2 > • • ■ c n /u n . 
This behavior justifies the sale of network capacity in "wholesale" (or "volume discount") where 
more the capacity is bought, the cheaper is the price per unit of bandwidth. 

We study the single-source buy-at-bulk (SSBB) network design problem with the following con- 
straints: an unknown number of source (or demand) nodes and unknown concave transportation 
cost function /. An abstraction of this problem can be found in many applications, one of which is 
data fusion in wireless sensor networks where the given constraints are assumed unknown or vary 
over time. Others include design of VLSI power circuitry, Transportation & Logistics (railroad, 
water, oil, gas pipeline construction) etc. For simplicity, we consider data fusion problem in com- 
munication networks, though SSBB can also be applied to data distribution problems; our solution 
holds for both the cases. 

As mentioned in Goel and Estrin |2003 , if information flows from k different sources over a 



link, then, the total information that needs to be transmitted is f(k), where the function / is 
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called a canonical fusion function where / is concave, non-decreasing and /(0) =0. If / is known 
in advance, then, the problem of building an optimal fusion tree is well understood Bartal 1 1998 



However, here we consider the case where both / and the number of data sources are unknown. The 
focus of this paper is to construct an oblivious spanning tree according to the following definition: 

Definition 1.1 (/c-Oblivious Spanning Tree (fc-OST)). A spanning tree T with root node s in 
a graph G is k-oblivious if it provides a k-factor approximation to any data fusion problem toward 
sink node s with an arbitrary set of data sources and arbitrary canonical fusion function f. 

In this paper, we consider building an oblivious spanning tree for doubling dimension graphs. 
Doubling dimension graphs has been used in many different contexts including compact routing in 
wired networks | Abraham et al. 2006 Konjevod et al. 2008 , traveling salesman, navigability and 
problems related to modeling the structural properties of the Internet distance matrix for distance 
estimation Kleinbere et al. 2009, Fraigniaud 20071 . As noted in Fraigniaud et al. 20061, it has 



become a key concept to measure the ability of network to support efficient algorithms or to realize 
specific tasks efficiently. For wireless networks, this concept has found many uses in solving many 
distributed communication problems |Kuhn et al. 2005 1, distributed resource- management |Gao 
et al. 2009 , information exchange among producers and consumers iFunke et al. 20061, and for 



determining other performance qualities such as energy-conservation in wireless sensor networks 
|Pemmaraju and Pirwani 2006 . 

SSBB is NP-Hard as the Steiner Tree problem is its special case (when f{x) = 1) due to 



reduction from Steiner tree problem Salman et al. (2000 . 



1.1 Contribution 

We build a single oblivious spanning tree for doubling-dimension graphs, such that the tree is 
independent of the data sources, and can accommodate any canonical fusion-cost function. Our 
approach gives a deterministic, polynomial-time algorithm that guarantees O(2 10p log 3 Dlogn)- 
OST (measured as the total involved communication cost), where p is the doubling dimension 
factor. For constant fusion cost functions, the fc-OST behaves like a fc-oblivious steiner tree. We 
define a k-oblivious steiner tree as a steiner tree that provides a /c-approximation to any data fusion 
problem towards the sink with arbitrary number of data sources. For constant fusion-cost function 
(/(•) = c), we obtain a O(2 10p log 3 D)-oblivious steiner tree to sink s. 

Our spanning tree construction is based on the following techniques. We partition the nodes 
in a hierarchical fashion. The selection of nodes for a given 'level' of hierarchy is based on their 
mutual distances proportional to the level. Nodes of successive levels are connected by shortest 
paths. The intersecting paths are appropriately modified to result in a spanning tree. A modified 
tree is built from the spanning tree to ensure that all paths have appropriate end-nodes. Analysis 
is done on this modified tree. 



1.2 Related Work 

SSBB problems have been primarily considered in both Operations Research and Computer Science 



et al. 



2000 



tree embedding. Bartal [l998| further improved this result to 0(log 
the first constant-factor approximation to the problem. 



n 



literatures in the context of flows with concave costs. SSBB problem was first introduced by Salman 
and a 0(log 2 n)-approximation was given by Awerbuch and Azar 1997 using metric 



Guha et al. 2001 provided 



Goel and Estrin 2003 build an overlay tree on graphs that satisfy triangle- inequality based on 



maximum matching algorithm that guarantees 1 + log k approximation, where k is the number of 
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Table 1: Our results and comparison with previous results for data-fusion schemes. IAF - Inde- 
pendent of fusion function, ISS - Independent of source-set, n is the total number of nodes in the 
topology, k is the total number of source nodes. 
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sources. An overlay tree, if projected to a graph, may not be a tree (could have cycles). In a related 
paper by Goel and Post |2009| , the authors construct (in polynomial time) a set of overlay trees 
from a given general graph such that the expected cost of a tree for any / is within an 0(l)-factor 
of the optimum cost for that /. 

Jia et al. |2006 build Group Independent Spanning Tree Algorithm (GIST) that constructs an 



overlay tree for randomly deployed nodes in 2D. The tree (that is oblivious to the number of data 
sources) simultaneously achieves 0(log n)-approximate fusion cost and 0(1) approximate delay. 
However, their solution assumes a constant fusion cost function. We summarize and compare the 
related work in Table [TJ 



Jia et al. 2005 provide approximation algorithms for TSP, Steiner Tree and set cover problems. 



They present a polynomial-time (0(log(n)), 0(log(n)))-partition scheme for general metric spaces. 
An improved partition scheme for doubling metric spaces is also presented that incorporates con- 
stant dimensional Euclidean spaces and growth-restricted metric spaces. The authors present a 
polynomial-time algorithm for Universal Steiner Tree (UST) that achieves polylogarithmic stretch 
with an approximation guarantee of 0(log 4 nj log log(n)) for arbitrary metrics and derive a loga- 
rithmic stretch, 0(log(n)) for any doubling, Euclidean, or growth-restricted metric space over n 
vertices. 

An earlier version of this work with preliminary results in the context of data aggregation 

[2009 . 



appeared as a brief announcement in Srinivasagopalan et al 



2 Definitions 

Consider a weighted graph G = (V,E,w), w : E — > Z + . Let s 6 V be the sink node. For 
any two nodes u, v S V let distc(u, v) denote the distance between u, v (measured as the total 
weight of the shortest path that connects u and v). Let D denote the diameter of G, that is, 
D = max Ui „ g y distG*(ii, v). Given a subset V' C V, we denote dista(u,V') the smallest distance 
between u and any node in V' . We also define near^^, V) = {v £ V : distc(u, v) = distc(u, V')} 
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to be the set of nodes in V that u is closest to. Given a set of nodes H C V and parameter d, 
we define Maximal Independent Set of G for distance d as I = MIS(G, H, d) to be an arbitrary 
maximal set of nodes in G such that H C I and for any u,v G G \ H, distG(u,v) > d and 
distc(w,a;) > d where x £ H. There is a simple greedy algorithm to compute it. A special case is 
the notion of (i-independent nodes computed by I = MIS(G, d). 

We adapt the definition of doubling-dimension graph from Nieberg |2006 , Gupta et al. 2003 



Definition 2.1 (r-Neighborhood). Given a graph G = (V,E), the r -neighborhood of any vertex 
u G V , Nc(u,r), is defined as the set of nodes whose distance (in hops) is at most r from u; 
Nc(u,r) = {v | distc(u,v) < r}. 

Definition 2.2 (Doubling Dimension Graph). A graph G with the smallest p such that every 
R-neighborhood is completely covered by atmost 2 P R/2-neighborhood is said to doubling. If p is 
bounded by a constant and is small, we say that G is doubling and has a low dimension. 

Observation 2.3. In any R-neighborhood of doubling dimension graphs, there are at most m 
R/ (2 J ) -neighborhoods where m = 2 plog ^ H /( 2J )^ j > 0. 



3 Spanning Tree Construction 

We start with an informal description of the construction of the spanning tree. We build the tree 
in a hierarchical manner that has k = O(logD) levels. At level k = [log-D] is the sink {I K = {s}) 
and at level k = are the individual nodes (Iq = V). Each level i of the hierarchy is built by 
identifying a set of independent nodes, Ij = {£},£}, ■ ■ • , }, (n > and the subscript corresponds 
to level), that are 2 l distance (in hops) apart. The hierarchy consists of k levels of independent 
nodes Jo, . . . ,I K where Ii C V, < i < k. Members of Ij are also called leaders of level i. Some 
leaders could belong to more than one level (eg. , sink s) . Leaders of consecutive levels are connected 
by shortest paths to build a tree. A formal description appears in Algorithm [l| 

The construction of hierarchical levels of independent nodes is top-down. Ii is computed by 
MIS{G, Ii+i, 2*), for < i < k — 1. Ii will contain all the 2 J -independent nodes of higher levels j, 
i < j < k as well as a 2 l -independent set of nodes. We enforce the constraint that s G Ii for every 
Ii. Note that each node v G Ii \ U+i has to be within distance 2 t+1 — 1 to at least one node in 
(otherwise v must be a member of h+i). 

Paths are also constructed in a top-down fashion. A path at any level i, pi starts at some leader 
at level i and ends at a leader at level i + 1. A set of all paths at level i is denoted as Pj and the set 
of all paths of all levels is denoted by P = {P K _i, P K -2, . . . , P2, Pi}. We begin constructing paths 
from level k — 1 to sink s by computing shortest paths from each node of I K -\ to s. We continue 
constructing the spanning tree by computing the paths for the remaining levels. Suppose we have 
constructed paths for level i; we will construct Pj_i. Each node v G I%—\ prefers some arbitrary 
node in nearc(v, Ii); if s G nearaiv , Ii) , then, s is preferred. Shortest paths from each node v are 
constructed to their respective leaders. 

When paths for all levels are built, the resulting stucture may not be a tree. It could result in a 
graph that might have intersecting paths. Define regular paths as paths that do not intersect any 
(higher-level) path on their way to their end-nodes. The paths of P K -i, are regular paths, since 
there were no higher- level paths to intersect and are included in P re f^- 

Define pruned paths as those paths that intersect paths of higher level. If a path pi intersects 
a path pj (j > i) along its way to Pi is pruned from the intersection point to its destination. 
Such paths are included in Pf r . This pruning of intersecting paths ensures the structural property 
of a spanning tree (see Figure [I]). 
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Algorithm 1: Spanning Tree 



Input: Graph G with sink s. 
Output: A spanning tree T s . 

P <- I K <- {s} ; // k 4 \logD] 

preg 0. ppr ^ . // List of re g U ]_ ar p rune d paths 

foreach level i = k — 1 to do 
h ^M/S(G,/ m ,2*); 
foreach v G Ij do 

// u chooses nearest £ G as parent . 

£ <— nearc(v, ; //If s G near(v, h+i) , s is chosen. 

Pt, <— Compute shortest path from w to £; 
if pt, intersects any path at level > i at point u then 

// Prune path p v by removing segment from u to £ 
p' <— path segment from v to u; 

ppr^ppr up ,. 

else 

L if* Up,; 



return T s formed by the paths in P; 



Note that regular paths of the same level could intersect and continue on different directions to 
reach a common leader. In this case, one of the paths is modified to use the same segment as the 
other after the intersection point. Another scenario is when two paths (say from u and v of level 
i) intersect at m and proceed to their respective endnodes x and y. In this case, either v or u will 
choose a common leader and appropriately modify its path. In both these scenarios, the resulting 
paths remain regular even if they overlap. Note that in both the cases, the path segments, after 
intersection, should have the same length. 

The spanning tree algorithm executes in k rounds and each round computes MIS (in 0(|i<7|) 
time, assuming that the input is given by an adjacency list) and shortest paths (in \E\ logra) time 
steps). This amounts to a total running time of O(logZ) • \E\ logra). 

4 Modified Tree Construction 

The pruned paths in the spanning tree T will not have leaders as end-nodes. To ensure that end- 
nodes of all paths are leaders, we modify T to T. We begin with an overview of the modified tree 
construction. We construct T from T by assigning alternate leaders to those paths whose 'upper' 
sections have been pruned. We first begin by assigning levels to all the nodes of regular paths 
by AssignLevels (see appendix) and including those paths in T. Then, we begin a top-down, 
level-by-level process where we 'modify' the pruned paths by extending the pruned paths to their 
newly assigned alternate leaders. Note that a modified path could be a concatenation of multiple 
pruned paths. Then, we assign levels to the nodes of the recently modified path as well and include 
this modified path in T. The end of this process results in a tree T. A more formal description 
appears in Algorithm [2] 

Define AssignLevels(pj, H, i), where H is a pair of end-nodes of pi, to assign levels to all the 
nodes of pi by identifying maximal independent nodes (excluding the end nodes of pi). Levels 
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Algorithm 2: Modified Tree 



Input: Spanning Tree T rooted at s. 
Output: A modified tree T. 

T^cjy- // T = P = {P K _ 1 ,P K _ 2 ,...,P 1 ,P } 

// Assign Levels to all nodes in all regular paths in T. 

i i — k — 1 ; // start from second level from top 

while i > do 

foreach pi G P[ eg do 

// v a and Vb are the start and end nodes of path pi 

H <— {v a ,Vb} ; // v a is at same level as that of pi. 

AssignLevels (pi, H, i); 
T<-TUpi; 

// Pruned paths in T - Modify paths and assign levels. 
oj i — k — 2; 
while oj > do 

foreach p^ G PT* do 

p w <— ModifyPath(p w ,pj) ; // p^ intersects p; L , i > oj and v' b be the elected 
pseudo-leader, pi may be a modified path itself. 

T«-TUp w ; 

P ■<— {^a, u^} ; 1 1 v a and are the start and end nodes of p u . 

AssignLevels P, a;); 

a; •<— a; — 1; 



return T; 



are assigned in the range (i — 1) to 0. A modified path is connected to an alternate leader called 
pseudo-leader by the function ModifyPath(p w ,pj) which chooses the nearest level- (u; + 1) node on 
Pi from the intersection point. 

Consider that we are at some level oj where < oj < k — 1 and suppose that there are several 
pruned paths in P w . Let p u G P w be one such path and let y G pi be the intersection point, where 
i > oj. A pseudo-leader, is chosen on using ModifyPath (pui,Pi) (see appendix). Note 

that this may alter Ii to by replacing the original leader by the pseudo-leader. The path p^ is 
extended from y to v w+ \ and this new extended path p^ replaces p w in the modified tree T. Once 
a new path p^ is established, all the nodes in it are assigned levels using (AssignLevels(p w , H,oj), 
where H is the set of end-nodes of p^). This procedure of modifying pruned paths, replacing the 
old pruned paths by new, extended, modified paths and assigning levels to all nodes in those paths 
is repeated for all levels down to 0. The resulting tree is a modified tree with normal leaders and 
pseudo-leaders for respective types of paths. 

5 Analysis 

We will analyze the algorithm based on the data fusion algorithm in the modified tree T. The 
fusion algorithm works in phases, where each phase consists of k rounds. In the beginning of each 
phase, several source nodes may have data to send. In the first round, each source node sends its 
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data to its respective parent at level 1. In the second round, the leaders at level 1 send the fused 
data to their respective parent at level 2. In general, in round i, where 1 < i < k, the leaders at 
level i — 1 send their data to their respective parent at level i which then fuses the received data. 
At the end of round n, the sink node s would have fused all the received data. 

The modified tree T naturally defines a hierarchical partition of G. For each node u £ 4 we 
define the respective cluster Zf to contain all the nodes in V which appear in the subtree of T 
rooted at u at level i. Node u is the leader of Zf. We denote by Z; L = U u g7~ Zf t ne partition of G 
at level i. Let Z = Uo<«<k ^ denote a partition hierarchy of V with k levels. Note that Z K consists 
of one cluster containing all the nodes in V. Clearly, each cluster Zf induces a connected subgraph 
in G. 

Let A denote all the source nodes. We assume that each data item generated at a source node 
has size 1. Let = A n Zf . According to the data fusion algorithm, at the end of a round no 
later than i, the data items in Af will appear fused at node u with size Let B\ denote 

the set of nodes at level i which at the end of round i hold data with size E [2 J , 2 J+1 — 1], 



where < j < A, and A = [log \A\~\ = O(logn). Let Bq = A. Lemmas (5.3 and 5.4) establishes the 
lower-bound on communication cost at each round. 

A path p? could be intersected by multiple lower-level paths. Even though the leaders at a 
level i are sufficiently far off, due to intersection by other paths, the leader at level i might be 
close to many leaders of lower level paths. However, the number of such leaders that are close is 



limited. Lemmas A. 3 A. 4 and 5.2 establishes the maximum number of pseudo-leaders in a given 
neighborhood. If all the nodes of a path pi belong to a cluster Zf, we call such a path to be a total 
internal path. Let 5 = 3-2*. 

Lemma 5.1. The maximum distance in G between any node v E Zf to u is 5 = 3 • 2 l and there is 
a total internal path from any node v £ Zf to u with respect to Zf . 

Proof. Consider a path p vu E Zf. In the worst case, this path could be a concatenation of several 
modified paths, ranging from level to i — 1. The total length of p v u would be equal to the sum 
of maximum lengths of each of those segments: ^}=o(^ • 2 J — 2) = 3 • 2* — 2i — 1 < 3 • 2\ 

By construction, any cluster Zf will contain those nodes of V that appear in the subtree of T 
rooted at u at level i. The path pi E T from any of the member nodes v E Zf to u contains only 
nodes that lie within the cluster Zf. Also, pi translates to a path p E T. □ 

Lemma 5.2. The total number of pseudo-leaders at level i, where i < r, which are inside Nc{x,2 r ) 
is at most 2 2 P^~ i+ ^ ■ (k - i + l) 2 . 



Proof. From Lemma A. 3 there are 2 p ( r * +3 ) • (k — i + 1) path segments pi+j E T, j > 0, crossing 



N(x, 2 r ). From Lemma A. 4 each such path segment can have multiple modified path segments at 
level i or higher passing through it (< 2 p ( r ~* +1 ) • (k — i + 1)), the total number of modified path 
segments that cross N(x,2 r ) would be atmost 2 2 ^ r ~* +3 ) • (k — i + l) 2 . This gives also an upper 
bound to the number of pseudo-leaders at level i or higher. □ 

Lemma 5.3. The communication cost of the data fusion algorithm at round i is at most \Bf_-AQ ■ 
2 l+1 , for any 1 < i < k. 

Proof. Let v be a leader in and suppose that v E Bj_ 1 . The cost of sending data from v to its 
parent at level i is at most 5 ■ (2- 7+1 — 1) < 3 ■ 2 l+jf+1 , which is obtained by the product of the total 



path length connecting v to its parent (bounded by 5), from Lemma 5.1, to the size of the data 
(bounded by 2 J+1 — 1). Therefore, the total cost at round r for communicating data in set Bf, is 
at most \Bf_A6- 2 i+j . □ 
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We proceed with estimating a lower bound on the communication cost at each round k. Let 
C*(A, G) denote the optimal communication cost for A data sources in graph G. We start with a 
simple lower bound. 

Lemma 5.4. If a message is sent in round i from a node v G Bf_ 1 then C*(A, G) > max(2 l+ - J_1 , 1). 

Proof. Since v G Bj_ lt the message sent by v during round i has size at least 2 J . If i = 1 then 
the lower bound 1 of the claim follows trivially. Suppose now that a message is sent during round 
i > 1. Let r C A be the set of source nodes such that for every x G T, distc(x, s) > 2 l_1 . We will 
show that /(|r|) > 2 J . Assume for contradiction that /(|r|) < 2 3 . From the construction of T it 
can be verified that every node within distance 2 4_1 from s is a member of the subtree of T rooted 
at s at level i — 1, since s is always preferred as a leader choice; these are the nodes of set A \ V. 
Thus, the last round of the algorithm in which a message is sent carrying information from data 
sources in A \ T is i — 1. Therefore, the message which is sent at round i must carry information 
only from the data sources in T. Therefore, /(|T|) > 2 J . The smallest cost to transfer the data 
from the T data sources to the sink s is more than /(|r|)2 I_1 > 2- ? 2* _1 = 2 i+3 ~ . □ 

We continue with an alternative lower bound. We construct a new family of multi-graphs by 
contracting the clusters of the partition hierarchy Z. Let G L = (Vi,Ei,w) be complete graph that 
we obtain when we contract the edges in the clusters of Z% and the whole cluster Zf is replaced by 
u. Each edge e = (x, y) G E is replaced by edge e' = (x' , y') G E{ where x' and y' are the respective 
leaders of x and y at partition Z< and w(x',y') = distcix' ,y'). 

Lemma 5.5. Given an arbitrary set of nodes IC^, where < i < K, there is a subset X' C X 
such that\X'\ > \X\/(2 10p -(k— i+l) 2 + l), and for each pair of distinct x,y G X' , distQ(x,y) >3-5. 

Proof. Construct a new unweighted graph H = (X, Eh) such that (x, y) £ Eh if dist^ (x, y) < 3-5. 



Let x £ X. From Lemma 5.2, the number of pseudo- leaders of level % which are in Nq(x,35) is 
bounded by m = 2 2 P { - i+2 - i + T ^(k - i + l) 2 = 2 2 p^ ■ (k - i + l) 2 = 2 10 p -(n-i + l) 2 . By construction 
of Gi, distg. (x, y) < 38 only if y G Nq(x, 36). Therefore, the degree of H is bounded by m. Hence, 
H accepts a m + 1 coloring which implies that there is an independent set X' C X in H of size 
\X'\ > \X\/(2 w p ■ (k - i + l) 2 + 1) . Clearly, for each x,y G X' , distp (x,y) >3-5. □ 

Lemma 5.6. C*(A,G) > 2 i+ ^~ 1 {\B{\/[2 w P ■ {K-i + 1) 2 + 1] - 1}, for every i, < i < k., < j < A. 

Proof. Consider a particular Bj. Let ^ = and Bj = {u\, . . . , u^}. We have that A^ q data 
sources are in Z?" and f(\A^ q \) G [2^' +1 - 1]. Let X = \J^ =1 A^ q . Let C*{X,G) denote the 
optimal cost of fusing the data from the X data sources to the sink node s in graph G. 

We transform the data fusion problem with the X sources to a new data fusion problem in Gi 
where all data sources in A^ q are relocated to the respective leader node u q . The size of the data held 
at the leader node is f(\A^ q \). Let X = B\. We have that the optimal cost of fusing the data from 
X in G is no larger than the cost of fusing the data from X in G, namely, C*(X, Gi) < C*(X, G), 
since graph Gi is obtained from G by contracting edges, and thus the optimal communication cost 
of G cannot get wors e in G. 

there is a subset X' C X, such that \X'\ > |X|/[2 10 ^ • (k - i + l) 2 + 1], and 



From Lemma 



5.5 



for each distinct x,y G X' , distg(x, y) > 35. Let x,y G X' be any two nodes u G Zf, v G Zf and 
distc(x,y) > 5. The cost of fusing the data stored in X' to the sink s is at least the weight of the 
Steiner tree Tg in Gi that connects the set of nodes X' (including s in the Steiner tree would only 
increase its cost) multiplied by 2 3 which is a lower bound on the size of the data from each source 
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in X'. The total weight of T s is w(T s ) > 5{\X'\ - 1) > <5{|#/|/[2 10 /> ■ (« - i + l) 2 + 1] - 1}. The 
claim follows, since w(T s ) ■ 2? < C*{X', G 4 ) < G*(X, G;) < G*(X, G) < C*(A, G). □ 

Theorem 5.7 (Approximation of Spanning Tree). Tree T is a-OST where a = O(2 10p log 3 D-log n) . 

Proof. Suppose that there is at least one data source different than the sink s. The algorithm runs 
in k = O(logn) rounds. Let be the cost of the algorithm when transferring data from the 

level i — 1 nodes Bji to the respective parent leader in level i. We have that the total cost of the 
algorithm at round i is Qi = Sj=o Qi-i- From Lemma 5.3 

QU < \BU6 ■ 2 i+ i . (1) 
Let Q{i_i = maxjj Q{_i- Since at least one message is sent in the algorithm, Q|,_ 1 > 1. From 

(2) 



Lemma 5.4 



C*(A,G) > max(2 i ' + J''- 1 ,l). 



From Lemma 5.6, we also have 

C*(A,G) > 2 i ' + i'- 1 



2 10 p ■ (k - i + l) 2 + 1 y 
If \b{,_ x \ < 2(2 10p • (k - i + l) 2 + 1), then, from Equations [l] and [2] we have: 



(3) 



Qi'-i 2(2 l °P ■ (k - i + l) 2 + 1) • 6 • 2 



i'+i' 



< 



2»'+i'— l 



<24-(2 10 ^-(k + 1) 2 + 1) 



(4) 



C*(A,G) 

If l-B^'-xl > 2(2 10p • (k - i + l) 2 + 1), then, from (1) and (3) we also obtain (4). Let Q be the 
total cost of the algorithm, Q = Qi. Since there are k levels and A clusters, from Equation [4] we 
have Q < k(A + 1)0% _ x < k(A + 1) • C*(4, G) • 24 • (2 10 '' • (« + l) 2 + l) . Thus, since k = 0{\ogD) 
and A = O(logn), we have C ,^ G) < k(A + 1) • 24 • (2 w p ■ (« + l) 2 + l) = O(2 10 /'log 3 L> • logra). □ 

If / is the constant cost function, then, A = and we obtain the following corollary. 

Corollary 5.8. The constant cost function f(x) = c, gives O(2 10p log 3 D)- oblivious Steiner tree 
from Theorem 5.7 



6 Simulation Results 



We simulated the Oblivious Spanning Tree and compared its performance (fusion cost) with 
GRID.GIST [Jia et all 12006], Maximum Matching Algorithm IGoel and Estrinl 120031 and other 



common trees such as MST (Minimum Spanning Tree) and SPT (Shortest-Paths Tree). We used a 
2-D grid topology for our simulation using NetworkX |Hagberg etltL| |2008| . 2-D grids are a special 
case of doubling dimension graphs and they fall under a variation of the Steiner tree problem called 
'Rectilinear Steiner Problem' (RSP) where the tree structure has only vertical and horizontal lines 
that interconnects all points and is proved to be NP-Complete |Garey and Johnson 1977] . Since 
calculating a minimum weight tree structure in a 2-D grid topology (a doubling-dimension graph) 
is essentially an RSP, the problem we are addressing is NP-Hard. 

We build a single spanning tree in a grid with 1600 nodes. We simulate it for random sets of 
data sources, upto 1445, that are randomly placed. Note that GRID.GIST is a special algorithm 
designed for grids and ours is a generalized algorithm. Hence, GRID.GIST performs slightly better 
than OST (in Fig [1] in appendix). 
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7 Conclusions and Future Work 



We studied the problem of computing near-optimal fusion trees when the number of sources and 
the fusion cost function are unknown in the context of Doubling Dimension Graphs. We have 
demonstrated that a simple, deterministic, polynomial-time algorithm based on independent sets 
can provide a near-optimal data structure for data-fusion (under the assumptions of concave cost 
function and doubling dimension graphs). The algorithm constructs a spanning tree which is 
then altered to produce a modified tree. We have shown that this algorithm guarantees log n- 
approximation over the optimal cost. As part of our future work, we are looking into the same 
problem on planar graphs, extending to multiple sinks and incorporating fault-tolerance and load- 
balancing. 
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A Observations and Proofs 



Lemma A.l (Presence of a Pseudo-Leader). The ModifyPath{p w ,pi) function gurantees se- 
lection of a (w + 1) -level pseudo-leader. 

Proof. Suppose path p w intersects a higher-level path pj. Let the start-node of p u be u and let the 
end-node of pi be v. Note that a path pi goes from level i to level i + 1. There could be two cases 
for the presence of a pseudo- leader in pi. If level of v is u; + 1, then, v itself acts as a pseudo-leader 
for u. If level of v is greater than cj + 1, then, pj must have some nodes (within its end-nodes) 
that have been assigned to level u> + 1 (by the AssignLevels function) . Hence, in either case, a 
psuedo- leader is guaranteed to be found in for u. □ 




(a) Intersecting paths (b) Pruned paths y, b and x, v' . (c) Modified Paths p a ^ b i and 



Figure 1: Pruning and Tree Modification. 



Lemma A. 2 (Upper Bound on \p'J\)- The upper bound on the length of p'^ is (3-2^ — 2). 

Proof. Consider a path p u that starts at x pi and intersects another path pi at y G pi. Since p^ 
is a pruned path, its distance from x to the intersection point y is atmost 2^ — 1 (if it was 2 U or 
more, point y would have been its original leader). ModifyPath will attempt to seek the nearest 



(u> + l)-level node (pseudo-leader) on pi from y (Lemma A.l). Note that y itself cannot be the 
pseudo-leader for x because, if it was, then, p^ would not have been a pruned path. The distance 
from y to a pseudo-leader v on pi would be atmost 2 w+l — 1 because if this distance was more than 
2 W+1 — 1, we would have found another pseudo-leader v' that is 2 W+1 distance away from v and 
closer to y. This is due to the presence of (2 w+1 )-independent set nodes on this path pi computed 
by AssignLevels. Note that y cannot be an end-node of pi and v could be one of the end-nodes 
of pi. Hence, the length of p'^ could be atmost 2^-1 + 2 W+1 - 1 = 3 • 2 W - 2. Note that pi itself 
could be a stretched pruned path and the upper bound holds irrespective of the length of pi. □ 

Figure [T] gives an example of intersecting path and its modification to reach a pseudo-leader 
and form a modified path. 

Consider an arbitrary node x G G with its neighborhood Nq{x, 2 r ), where r > 0. 

Lemma A. 3 (Max path segments). The total number of path segments p G T at level i or 
higher that cross Nq(x, 2 r ) is at most 2^( r-l+3 ) ■ (k — i + 1). 



i 



Function ModifyPath(p m , pi) 



Input: Paths p% and p m where p m intersects pi and i > m 
Output: A modified path p m . 

// Let p m start from x ^ pi and intersect at y £ Pi along its path to leader 

v m+ i <— From y, identify the nearest level-(m + 1) node v G pf, 
p^ <— subpath from x to y in p m ; 
p b m subpath from y to v m+ \ in pi\ 

Pm <- Pm+Pm ! // Concatenate p^ and p b n 

return p m ; 



Function AssignLevels(pj, H, 



Input: Path pi, set of end-nodes H of pi , level i. 
Output: Assignment of levels to all nodes in p{. 

L\ «— 4> ; // Set of 2 A -independent nodes 

for A «- (i - 1) to do 

// Find 2 A -independent nodes at levels A = (i — 1), (i — 2), . . . , 1, 0. 

L x ^ MIS( Pi ,H,2 x ); 
Assign level A to nodes in L\. 



Proof. We know, by construction, that the length of a path pi + j 6 T is atmost 2 J+ - ? where < 
j < (k — i) and that there is atmost one leader £i + j G Ij within N(x, 2 l+3 /2). Since we are looking 
at the number of path segments pi+j that go through N(x,2 r ), consider a large neighborhood 
N(x, (2 lJr3 + 2 r )) and determine the number of neighborhoods of radius 2 l+3 /2; N(x, 2 l+J /2). This 

plog (2'+J+2 r ) 

is equivalent to atmost 2 2 i +j/2 < 2P lo s( r - l + 3 ) = 2P( r_J + 3 ) because of doubling dimension. For 
all paths that span the levels from i to k, the total number of path segments that cross N(x, 2 t+J /2) 
is equal to 2^ r ~ i+3 ) -(n-i + l). □ 

Lemma A. 4 (Max modified paths in a path segment). Consider a path segment p G T that 
crosses Ng(x, 2 r ). The total number of modified paths p G T at level i or higher that use nodes in 
pDN G (x,2 r ) is at most 2P( r ~ i+ ^ ■ (k - i + 1). 



Proof. Let Q = pnNc(x, 2 r ). From Lemma A. 2 , we know that the maximum length of any modified 
path p i+ j would be 3 • 2* + - J — 2 < 4 • 2 l+jf = 2* +J+2 . To find the total number of modified paths p i+ j 
that passes through Q, we consider a larger neighborhood N(x, 2 t+ i +2 + 2 r ) and find the number 
of N(y, 2(?±£^)) that would cover the larger neighborhood. Since each p i+ j has start node in Ii+j 

2 i+j+2 +2 r 

and by doubling property of the graph, it can be computed to 2 P og 2 i +i+ 1 < 2 p ( r ~ l+l > . Since 
j G [0, (k — i)], the total number of paths would be 2 P ^ T ~ %+1 " > ■ (k — i + 1). □ 
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Figure 2: Fusion Cost for varying set of source nodes in a 1600-node grid. 
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